105 research outputs found
Analysis and application of hop count in multi-hop wireless ad-hoc networks
Hop count, i.e., the number of wireless hops a packet has to go through to reach the destination, is a fundamental metric in multi-hop wireless ad-hoc networks. Network performance, such as throughput, end-to-end delay, energy consumption, and so on, depends critically on hop count. Previous work on modeling hop count is limited in making unrealistic simplifying assumptions either at the physical or network, or both layers of the communication protocol stack. A key contribution of this thesis is to present an analytical model to derive the probability distribution of hop count under realistic assumptions at both physical and network layers. Specifically, the model considers a log-normal shadowing radio propagation capable of accommodating the random signal fading observed in most wireless communication environments, and the widely used geographic routing at the network layer. Validation of the model is achieved by a comprehensive set of simulation experiments including a trace driven simulation of a real-word vehicular ad-hoc network. The model reveals that the presence of randomness in radio propagation reduces the required number of hops to reach a given destination significantly. To demonstrate the utility of the proposed hop count model, the thesis proposes three new applications which address some of the key challenges in multi-hop wireless networks. The first application derives the per-node packet forwarding load in multi-hop wireless sensor networks and reveals that the nodes in the vicinity of the base station has a significantly less forwarding load than previously thought under simplifying radio propagation and routing assumptions. The second application demonstrates that using hop count as a measure of distance traveled by a data packet, geocasting can be achieved in multi-hop wireless networks in situations when some of the network nodes do not have access to reliable location information. Finally, the proposed hop count model is used to evaluate the performance of the third application which demonstrates that the overhead of geographic routing can be reduced significantly by embracing a position update philosophy which adapts to the mobility and communication patterns of the underlying ad-hoc network
A Survey of Learning-based Automated Program Repair
Automated program repair (APR) aims to fix software bugs automatically and
plays a crucial role in software development and maintenance. With the recent
advances in deep learning (DL), an increasing number of APR techniques have
been proposed to leverage neural networks to learn bug-fixing patterns from
massive open-source code repositories. Such learning-based techniques usually
treat APR as a neural machine translation (NMT) task, where buggy code snippets
(i.e., source language) are translated into fixed code snippets (i.e., target
language) automatically. Benefiting from the powerful capability of DL to learn
hidden relationships from previous bug-fixing datasets, learning-based APR
techniques have achieved remarkable performance. In this paper, we provide a
systematic survey to summarize the current state-of-the-art research in the
learning-based APR community. We illustrate the general workflow of
learning-based APR techniques and detail the crucial components, including
fault localization, patch generation, patch ranking, patch validation, and
patch correctness phases. We then discuss the widely-adopted datasets and
evaluation metrics and outline existing empirical studies. We discuss several
critical aspects of learning-based APR techniques, such as repair domains,
industrial deployment, and the open science issue. We highlight several
practical guidelines on applying DL techniques for future APR studies, such as
exploring explainable patch generation and utilizing code features. Overall,
our paper can help researchers gain a comprehensive understanding about the
achievements of the existing learning-based APR techniques and promote the
practical application of these techniques. Our artifacts are publicly available
at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}
時空間データを用いた交通事故リスク分析
学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 柴崎 亮介, 東京大学教授 貞広 幸雄, 東京大学准教授 竹内 渉, 東京大学准教授 関本 義秀University of Tokyo(東京大学
GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction
Automated program repair (APR) aims to fix software bugs without human
intervention and template-based APR has been widely investigated with promising
results. However, it is challenging for template-based APR to select the
appropriate donor code, which is an important repair ingredient for generating
candidate patches. Inappropriate donor code may cause plausible but incorrect
patch generation even with correct fix patterns, limiting the repair
performance.
In this paper, we aim to revisit template-based APR, and propose GAMMA, to
directly leverage large pre-trained language models for donor code generation.
Our main insight is that instead of retrieving donor code in the local buggy
file, we can directly predict the correct code tokens based on the context code
snippets and repair patterns by a cloze task. Specifically, (1) GAMMA revises a
variety of fix templates from state-of-the-art template-based APR techniques
(i.e., TBar) and transforms them into mask patterns. (2) GAMMA adopts a
pre-trained language model to predict the correct code for masked code as a
fill-in-the-blank task. The experimental results demonstrate that GAMMA
correctly repairs 82 bugs on Defects4J-v1.2, which achieves 20.59\% (14 bugs)
and 26.15\% (17 bugs) improvement over the previous state-of-the-art
template-based approach TBar and learning-based one Recoder. Furthermore, GAMMA
repairs 45 bugs and 22 bugs from the additional Defects4J-v2.0 and QuixBugs,
indicating the generalizability of GAMMA in addressing the dataset overfitting
issue. We also prove that adopting other pre-trained language models can
provide substantial advancement, e.g., CodeBERT-based and ChatGPT-based GAMMA
is able to fix 80 and 67 bugs on Defects4J-v1.2, indicating the scalability of
GAMMA. Overall, our study highlights the promising future of adopting
pre-trained models to generate correct patches on top of fix patterns.Comment: Accepted to 38th IEEE/ACM International Conference on Automated
Software Engineering (ASE2023
A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair
Large Language Models (LLMs) have been gaining increasing attention and
demonstrated promising performance across a variety of Software Engineering
(SE) tasks, such as Automated Program Repair (APR), code summarization, and
code completion. For example, ChatGPT, the latest black-box LLM, has been
investigated by numerous recent research studies and has shown impressive
performance in various tasks. However, there exists a potential risk of data
leakage since these LLMs are usually close-sourced with unknown specific
training details, e.g., pre-training datasets.
In this paper, we seek to review the bug-fixing capabilities of ChatGPT on a
clean APR benchmark with different research objectives. We first introduce
{\benchmark}, a new benchmark with buggy and the corresponding fixed programs
from competitive programming problems starting from 2023, after the training
cutoff point of ChatGPT. The results on {\benchmark} show that ChatGPT is able
to fix 109 out of 151 buggy programs using the basic prompt within 35
independent rounds, outperforming state-of-the-art LLMs CodeT5 and PLBART by
27.5\% and 62.4\% prediction accuracy. We also investigate the impact of three
types of prompts, i.e., problem description, error feedback, and bug
localization, leading to additional 34 fixed bugs. Besides, we provide
additional discussion from the interactive nature of ChatGPT to illustrate the
capacity of a dialog-based repair workflow with 9 additional fixed bugs.
Inspired by the findings, we further pinpoint various challenges and
opportunities for advanced SE study equipped with such LLMs (e.g.,~ChatGPT) in
the near future. More importantly, our work calls for more research on the
reevaluation of the achievements obtained by existing black-box LLMs across
various SE tasks, not limited to ChatGPT on APR
Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting
With the rapid development of the Intelligent Transportation System (ITS),
accurate traffic forecasting has emerged as a critical challenge. The key
bottleneck lies in capturing the intricate spatio-temporal traffic patterns. In
recent years, numerous neural networks with complicated architectures have been
proposed to address this issue. However, the advancements in network
architectures have encountered diminishing performance gains. In this study, we
present a novel component called spatio-temporal adaptive embedding that can
yield outstanding results with vanilla transformers. Our proposed
Spatio-Temporal Adaptive Embedding transformer (STAEformer) achieves
state-of-the-art performance on five real-world traffic forecasting datasets.
Further experiments demonstrate that spatio-temporal adaptive embedding plays a
crucial role in traffic forecasting by effectively capturing intrinsic
spatio-temporal relations and chronological information in traffic time series.Comment: Accepted as CIKM2023 Short Pape
Backdooring Neural Code Search
Reusing off-the-shelf code snippets from online repositories is a common
practice, which significantly enhances the productivity of software developers.
To find desired code snippets, developers resort to code search engines through
natural language queries. Neural code search models are hence behind many such
engines. These models are based on deep learning and gain substantial attention
due to their impressive performance. However, the security aspect of these
models is rarely studied. Particularly, an adversary can inject a backdoor in
neural code search models, which return buggy or even vulnerable code with
security/privacy issues. This may impact the downstream software (e.g., stock
trading systems and autonomous driving) and cause financial loss and/or
life-threatening incidents. In this paper, we demonstrate such attacks are
feasible and can be quite stealthy. By simply modifying one variable/function
name, the attacker can make buggy/vulnerable code rank in the top 11%. Our
attack BADCODE features a special trigger generation and injection procedure,
making the attack more effective and stealthy. The evaluation is conducted on
two neural code search models and the results show our attack outperforms
baselines by 60%. Our user study demonstrates that our attack is more stealthy
than the baseline by two times based on the F1 score
A Survey of Source Code Search: A 3-Dimensional Perspective
(Source) code search is widely concerned by software engineering researchers
because it can improve the productivity and quality of software development.
Given a functionality requirement usually described in a natural language
sentence, a code search system can retrieve code snippets that satisfy the
requirement from a large-scale code corpus, e.g., GitHub. To realize effective
and efficient code search, many techniques have been proposed successively.
These techniques improve code search performance mainly by optimizing three
core components, including query understanding component, code understanding
component, and query-code matching component. In this paper, we provide a
3-dimensional perspective survey for code search. Specifically, we categorize
existing code search studies into query-end optimization techniques, code-end
optimization techniques, and match-end optimization techniques according to the
specific components they optimize. Considering that each end can be optimized
independently and contributes to the code search performance, we treat each end
as a dimension. Therefore, this survey is 3-dimensional in nature, and it
provides a comprehensive summary of each dimension in detail. To understand the
research trends of the three dimensions in existing code search studies, we
systematically review 68 relevant literatures. Different from existing code
search surveys that only focus on the query end or code end or introduce
various aspects shallowly (including codebase, evaluation metrics, modeling
technique, etc.), our survey provides a more nuanced analysis and review of the
evolution and development of the underlying techniques used in the three ends.
Based on a systematic review and summary of existing work, we outline several
open challenges and opportunities at the three ends that remain to be addressed
in future work.Comment: submitted to ACM Transactions on Software Engineering and Methodolog
An Energy-efficient Rate Adaptive Media Access Protocol (RA-MAC) for Long-lived Sensor Networks
We introduce an energy-efficient Rate Adaptive Media Access Control (RA-MAC) algorithm for long-lived Wireless Sensor Networks (WSNs). Previous research shows that the dynamic and lossy nature of wireless communications is one of the major challenges to reliable data delivery in WSNs. RA-MAC achieves high link reliability in such situations by dynamically trading off data rate for channel gain. The extra gain that can be achieved reduces the packet loss rate which contributes to reduced energy expenditure through a reduced numbers of retransmissions. We achieve this at the expense of raw bit rate which generally far exceeds the application’s link requirement. To minimize communication energy consumption, RA-MAC selects the optimal data rate based on the estimated link quality at each data rate and an analytical model of the energy consumption. Our model shows how the selected data rate depends on different channel conditions in order to minimize energy consumption. We have implemented RA-MAC in TinyOS for an off-the-shelf sensor platform (the TinyNode) on top of a state-of-the-art WSN Media Access Control Protocol, SCP-MAC, and evaluated its performance by comparing our implementation with the original SCP-MAC using both simulation and experiment
- …